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In the pursuit of clinical utility, neuroimaging researchers of psychiatric and neurological 
illness are increasingly using analyses, such as support vector machine, that allow 
inference at the single-subject level. Recent studies employing single-modality data, 
however, suggest that classification accuracies must be improved for such utility to be 
realized. One possible solution is to integrate different data types to provide a single 
combined output classification; either by generating a single decision function based on an 
integrated kernel matrix, or, by creating an ensemble of multiple single modality classifiers 
and integrating their predictions. Here, we describe four integrative approaches: (1) an 
un-weighted sum of kernels, (2) multi-kernel learning, (3) prediction averaging, and (4) 
majority voting, and compare their ability to enhance classification accuracy relative to 
the best single-modality classification accuracy. We achieve this by integrating structural, 
functional, and diffusion tensor magnetic resonance imaging data, in order to compare 
ultra-high risk (n=19), first episode psychosis (n=19) and healthy control subjects 
(n = 23). Our results show that (i) whilst integration can enhance classification accuracy by 
up to 13%, the frequency of such instances may be limited, (ii) where classification can be 
enhanced, simple methods may yield greater increases relative to more computationally 
complex alternatives, and, (iii) the potential for classification enhancement is highly 
influenced by the specific diagnostic comparison under consideration. In conclusion, our 
findings suggest that for moderately sized clinical neuroimaging datasets, combining 
different imaging modalities in a data-driven manner is no "magic bullet" for increasing 
classification accuracy. However, it remains possible that this conclusion is dependent 
on the use of neuroimaging modalities that had little, or no, complementary information 
to offer one another, and that the integration of more diverse types of data would have 
produced greater classification enhancement. We suggest that future studies ideally 
examine a greater variety of data types (e.g., genetic, cognitive, and neuroimaging) in 
order to identify the data types and combinations optimally suited to the classification of 
early stage psychosis. 
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INTRODUCTION 

In response to growing demand for clinically translatable research 
(Matthews et al., 2006; Borgwardt and Fusar-Poli, 2012) neu- 
roimaging investigators of psychiatric and neurological illness 
are increasingly using analyses that allow inference at the single 
subject level (Orru et al, 2012). One such method is the sup- 
port vector machine (SVM) classifier, which is able to classify 
individuals into predefined groups, and yield an associated accu- 
racy indicative of how well it will generalize to future individual 
cases. A type of multivariate supervised pattern recognition algo- 
rithm, the use of SVM has become progressively widespread in 



both neurology and psychiatry to reveal patterns of alteration in 
patients relative to HCs that may potentially be used to (i) inform 
clinical diagnosis, and/or, (ii) predict treatment response (Orru 
et al, 2012). When considering the ultimate development of SVM 
as a real-world clinical aid, however, arguably greater levels of dis- 
criminative accuracy are required than those currently reported. 
One method proposed to achieve this is the integration of data 
from different modalities, such that, complementary informa- 
tion from each modality can be used (Kittler et al., 1998). This 
is based on the premise that algorithms generated using differ- 
ent types of data will base their classifications on distinct patterns 
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of alteration and also make distinct pattern misclassifications. 
Combining different classifiers within a single SVM therefore, or, 
alternatively, by creating an ensemble of multiple single modality 
SVMs, both aim to increase accuracy through the derivation of a 
consensus decision, as opposed to a single modality, single deci- 
sion, classifier (Kittler et al., 1998). To date, existing applications 
involving Alzheimer's patients have generally shown encouraging, 
albeit modest, increases in predictive averaging ranging between 3 
and 7% relative to the best single modality classification accuracy 
(BSMCA) (Fan et al, 2008; Hinrichs et al, 2011; Zhang et al, 
2011). With specific reference to psychosis in comparison, only 
one recent study investigating ChSz has been published, in which 
the authors reported that using an integrative approach they were 
able to classify patients from HCs with 87.25% accuracy (Yang 
et al, 2010) representing an increase of approximately 5% relative 
to the BSMCA. Despite these promising results, these four studies 
employed only two methods, or variations thereof, for integrat- 
ing data within SVM, namely, majority voting and multi kernel 
learning. Though alternative methods are available, to date no 
systematic investigation has yet been conducted examining the 
relative efficacies of a range of distinct integrative methods to 
combine multimodal neuroimaging data within the same clin- 
ical sample. It therefore remains unclear the extent to which 
combining data modalities can improve accuracy in a typical neu- 
roimaging sample, and if so, which integrative approach provides 
the greatest classification increase and in what context. 

In the current investigation, we provide a brief review of four 
different approaches that can be used to integrate data from mul- 
tiple sources, namely, (1) an un-weighted "simple" sum of kernels 
(SK), (2) multi-kernel learning (MKL), (3) prediction averaging 
(AV), and (4) majority voting (MV). These particular methods 
were chosen on the basis that they: (i) are frequently used in 
the (limited) psychiatric and neurological literature (Fan et al., 
2008; Yang et al, 2010; Hinrichs et al, 2011; Zhang et al, 2011) 
and/or (ii) are relatively straightforward to implement. We then 
apply each approach to the same data set in order to empirically 
examine their potential to enhance classification accuracy rela- 
tive to the BSMCA. In addition, in order to investigate the impact 
made by the number of data types being combined on levels of 
integrated accuracy, we performed multi-modal integration using 
varying numbers of data types. 

The data set to which each integrative method was applied 
is taken from work conducted recently by our own group in 
which we assessed the ability of different modalities to successfully 
classify first episode psychosis (FEP) and ultra-high risk (UHR) 
subjects from healthy controls (HCs), and from each other. For 
this study ethics approval was granted by the local Research 
Ethics Committee (reference number: 08/H0805/64). Our results 
showed that in conjunction with SVM, structural MRI (sMRI) 
data was able to discriminate UHR from HCs and FEP sub- 
jects with significant (p < 0.05) accuracies of 68.42 and 76.67%, 
respectively; diffusion tensor imaging (DTI) data was able dis- 
criminate both UHR and FEP from HCs with 65.79% accuracy; 
and, functional (MRI) data was able to discriminate FEP sub- 
jects from UHR and HCs with up to 68.42 and 73.33% accuracy, 
respectively (Pettersson-Yeo et al, 2013) (see Table 1). Based on 
these data, our primary aim was to examine the ability of the four 



integrative methods outlined, to enhance classification accuracy 
by integrating information from the three distinct neuroimaging 
modalities; sMRI, DTI, and fMRI (comprising one of two func- 
tional contrasts), in order to discriminate UHR from HCs, FEP 
from HCs, or UHR from FEP subjects, relative to the BSMCA for 
each diagnostic comparison. 

Since only a few studies have applied integrative techniques 
to neuroimaging data, for the purpose of the current study 
our hypotheses regarding which method may work best were 
informed from similar work in the field of proteomics (Lewis 
et al., 2006). Based on this previous work conducted by Lewis 
et al., in which alternative SVM integration methods were applied 
to the prediction of protein interactions and subsequently com- 
pared (Lewis et al., 2006), we hypothesized that (i) for two 
modality combinations, an un-weighted SK would perform as 
well, if not better, than the more sophisticated, weighted MKL, 
with AV likely to perform equally as well as MKL, (ii) for three 
modality combinations, MKL and AV would perform as well, or 
better than, SK and MV, given their respective ability to explic- 
itly, or implicitly, dampen the contribution of "noisy" data to the 
definition of the optimal separating hyperplane (OSH), and (iii) 
based on the spectrum of different results obtained using single 
modality data in conjunction with SVM for each of the three 
diagnostic comparisons, the ability of each integrative method 
to enhance classification accuracy would vary depending on the 
diagnostic comparison to which it was applied. 

MATERIALS AND METHODS 
SVM 

Originally developed in the early 1990s (Cortes and Vapnik, 
1995), and stemming from statistical learning theory (Vapnik, 
1999), SVM is a multivariate pattern recognition algorithm well 
suited to binary group classification. The SVM aims to learn a 
decision function that correctly predicts the class label (conven- 
tionally denoted by y = +1 or —1) for each data point, based on 
a set of m training examples {xj,yi}™ =1 , where x; are data vec- 
tors associated with each label. The goal is then to predict the 
labels for a set of unseen testing examples (Burges, 1998). Under 
the linear kernel formulation employed in the present work, a 
dot product similarity measure was used to represent data in a 
symmetric, positive definite kernel matrix. In this feature space 
SVM can be used to linearly separate groups (i.e., classes) of 
individuals (e.g., FEP and UHR subjects). The linear SVM deci- 
sion function (Equation 1) can be written as the dot product 
between each data vector and a vector of predictive weights (w). 
The predicted class label can then be derived by taking the sign 
of the decision function. The weight vector represents an OSH in 
the input (i.e., voxel) space and can be represented in terms of 
the most difficult data points to classify (referred to as support 
vectors). The optimal weight vector is determined by maximizing 
the margin between groups thus aiming to ensure good general- 
ization to new data, an approximately unbiased estimate of which 
can be obtained using cross-validation (Hastie et al., 2001; Lemm 
et al, 2011). Here, leave-one-out cross validation (LOOCV) was 
employed, an iterative process whereby each subject is omitted 
during the training of the classifier and used as an indepen- 
dent test set to test the trained classifier's accuracy, with the 
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Table 1 | SVM classification accuracies using single modality data for each diagnostic comparison. 



SVM comparison 


sMRI 


DTI 


fMRI 
contrast 1 


fMRI 

contrast 2 


fMRI 
contrast 3 


fMRI 
contrast 4 


fMRI 
contrast 5 




GM 


FAS 


In > CFI 


Su > CFS 


In > Rl 


Su > RS 


Su > In 


UHR vs. HC (%) 


68.42 
(68.42/68.42) 


65.79 
(68.42/63.16) 


36.84 
(36.84/36.84) 


60.53 
(57.89/63.16) 


57.89 
(57.89/57.89) 


60.53 
(57.89/57.89) 


47.37 
(31.58/63.16) 


FEPvs. HC (%) 


63.16 

(57.89/68.42) 


65.79 
(68.42/63.16) 


68.42 
(63.16/73.68) 


47.37 
(42.11/52.63) 


65.79 
(63.16/68.42) 


44.74 
(36.84/52.63) 


63.16 
(47.37/78.95) 


FEPvs. UHR (%) 


76.67 
(80.00/73.33) 


56.67 
(46.67/66.67) 


73.33 
(66.67/80.00) 


53.33 
(40.00/66.67) 


63.33 
(53.33/73.33) 


46.67 
(46.67/46.67) 


53.33 
(40.00/66.67) 



MRI, magnetic resonance imaging; sMHI, structural MRI; DTI, diffusion tensor imaging; fMRI, functional MRI; GM, grey matter; FAS, fractional anisotropy skeleton; 
In, generation of an overt verbal initiation response; Su, generation of an overt verbal suppression response; Rl, repetition of "REST" during the initiation condition; 
RS, repetition of "REST" during the suppression condition; CFI, visual cross-fixation during the initiation condition; CFS, visual cross-fixation during the suppression 
condition; UHR, Ultra-High Risk; FEP, First Episode Psychosis; HC, Healthy Subjects. Figures in brackets are the sensitivity/specificity for each classifier. 



final reported classification accuracy (i.e., proportion of subjects 
correctly classified) representing the average over all iterations. 
Whilst providing an approximately unbiased estimate of general- 
izability for a given sample, however, we note that this technique 
does not necessarily offset the impact of using a relatively small 
sample size in the context of generalization to as yet unseen psy- 
chosis subjects, with larger samples ultimately being the ideal. The 
SVM objective function is provided in Equations 2 and 3, reflect- 
ing the primal, and dual, space representations respectively. Here, 
w is a vector of predictive weights in the input (primal) space, 
b denotes offset, e, denote slack variables which permit data to 
be misclassified in the training set, a,- denote Lagrange multipli- 
ers (or dual space weights) and C is a parameter regulating the 
balance between maximizing the margin between data points and 
allowing misclassification in the training set. For a more detailed 
description of SVM see Burges (1998) or Scholkopf and Smola 
(2002). For an overview of SVM in the context of neuroimaging, 
see Pereira et al. (2009) and Lemm et al. (2011). 

f(x, w) = w T x + b (1) 



minimize \ \\ w\\ 2 + C YJ= i Hi (2) 

subject to : y,- (w T x t ) > 1 — f; 

Hi > 0 Vi 



minimize Ya= i «t _ \ HT= l a^y^^xu Xj) (3) 

subject to : 0 < a; < C Vi 

m 

Y^ym = o vi 

i= i 

where k (x;, Xj\ is the kernel, here taken to be a (linear) dot 
product between data samples. Equations 2 and 3 are convex 
optimization problems and can be efficiently optimized with con- 
ventional quadratic solvers. In the present work, the LIBSVM 



implementation was employed (Chang and Lin, 2011) as imple- 
mented in the PROBID software toolbox (http://www.brainmap. 
co.uk/probid.htm). As is common in neuroimaging data, the 
value of the SVM regularization parameter C was fixed to one. 

COMBINING CLASSIFIERS 

In order to generate a single output decision from multiple 
sources, two options are, (1) find a linear combination of the ker- 
nel matrices representing each data modality in order to train and 
test a single SVM; in this case, predictive weights are estimated 
jointly from all data, or (2) train multiple single modality classi- 
fiers and subsequently combine the output decisions to generate 
a single decision function using label fusion techniques; in this 
case, the weight vectors for each data type are estimated indepen- 
dently. Of the four approaches used in the current investigation, 
SK and MKL are variations of option 1, and MV and prediction 
averaging are variations of option 2. In Figure 1 we depict a rep- 
resentative pipeline showing the steps used for each approach. For 
all classifiers, classification accuracy can be calculated by dividing 
the number of correct predictions by the total number of predic- 
tions. A balanced accuracy for two groups can be obtained based 
on the mean of the classifier's sensitivity and specificity; note that 
this is equivalent to the standard definition of accuracy (i.e., [True 
Positives + True Negatives] /[Total Number of Subjects]) when 
equal sized groups are used, as is the case here. 

Un-weighted simple sum of kernels 

A well-known property of kernels is that they can be com- 
bined via linear operations (e.g., addition and multiplication) 
to yield a valid kernel. As described, a linear kernel matrix is 
used in the present work to represent the similarity between data 
points within each data modality (Equation 4). Thus, a sim- 
ple way to combine data modalities is simply to add the kernel 
matrices. Importantly, different modalities may have different 
numbers of features and may also be scaled differently. To account 
for this, each kernel was first normalized before being summed 
together to create a new kernel matrix representing data from 
all modalities (Equation 4: example shown represents combining 
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FIGURE 1 | Flowchart depicting the processing pipeline for each 
type of integrative approach. MRI, magnetic resonance imaging; 
sMRI, structural MRI; DTI, diffusion tensor imaging; fMRI, 
functional MRI; Ks/f/d, kernel matrix for sMRI/fMRI/DTI data; SVM, 



support vector machine; K', integrated kernel matrix; f(x), SVM 
decision function; w' , optimum vector of predictive weights 
obtained using K'; wk, optimum weight coefficient assigned to 
each base kernel. 



two data types only, i.e., Ki and K 2 - This is equivalent to first 
dividing each data vector by its Euclidean norm, then concatenat- 
ing the feature vectors for all modalities. Under this framework, 
the data from each source are assigned an equal weighting in 
terms of their contribution in defining the OSH. A SVM is then 
trained and tested (Equations 2 and 3) using this new inte- 
grated kernel (K r ), such that classification is based on all data 
sources. 

{K x ) t] (K 2 )ij 



K' = 



+ 



^(K^iKOjj yCK 2 )„ (K 2 ) }j 



(4) 



Multi-kernel learning 

The MKL approach provides a means of automatic kernel com- 
bination with the aim of producing a "best kernel" from a linear 
combination of q input ("base") kernels, such that the optimal 
kernel is given by: 



K = p k K k 



k= 1 



where fit are predictive weights for each base kernel. These 
are optimized simultaneously with the dual predictive weights 
in the ordinary SVM framework. Many different optimization 
and regularization frameworks exist for MKL (e.g., Lanckriet 
et al., 2004; Sonnenburg et al., 2006). In this work, we employed 
an MKL formulation based on elastic net regularization. See 
Tomioka and Suzuki (2010) for details. Under this framework, 
the regularization penalty of the model is a combination of LI 



and L2 components. A tuning parameter k e [0, 1] governs the 
relative contribution of the respective LI and L2 regularization 
terms, such that X = 0 denotes an extremely sparse (LI) model, 
where many kernel weighting coefficients are pushed to zero 
and k = 1 denotes a uniform weighted combination of kernels. 
The elastic net regularizer thus aims to find an optimal balance 
between enforcing sparsity and allowing kernels that are corre- 
lated with one another to participate in the model. In this work, 
we employ the implementation provided in the SHOGUN tool- 
box, (Sonnenburg et al., 2010; http://www.raetschlab.org/suppl/ 
shogun). To ensure optimal performance in MKL, proper tuning 
of the elastic net regularization parameters (C and k) is crucial. 
Here this was achieved using a nested cross-validated grid search, 
where C took the range of values from 0.001 to 1000 (six steps, 
iteratively increasing order of magnitude) Lambda, 0.1 to 1 in 
steps of 0.1. 

Averaging 

In contrast to SK, or MKL, AV integrates modalities at the level 
of predictions (i.e., forming an ensemble decision after each base 
classifier has been trained and tested on a single modality; see 
Figure 1). Integration is achieved by taking the mean of the pre- 
dictive function values over all modalities and computing its sign 
to derive an average class prediction. Hence for a given subject (/), 
a base classifier is trained for each modality (Equation 3), which 
we denote by f(x,w c ), c = 1, . . . , q, and the final class based on 
integrated data using AV predicted by: 
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Majority voting 

Similar to AV, MV also performs integration at the level of the 
predictions. However, for MV only the sign (i.e., binary out- 
come) of the decision function is considered, rather than its sign 
and magnitude as in AV. Under the MV approach, the final class 
label is therefore determined by assigning the sample to the class 
obtaining the largest number of predictions amongst the base 
classifiers. 

Since MV only relies on the binary outcome, in cases where an 
even number of data types are combined, it is possible that tied 
decisions may occur, in which case, they must be broken using any 
arbitrary heuristic provided it is chosen a priori. In the current 
investigation MV was therefore not performed for data combined 
from two modalities, because in such cases ties are very likely. 
Thus, the final classification is likely to be strongly influenced by 
the heuristic chosen. 

DATA USED FOR SVM INTEGRATION 

Alterations in grey matter (GM), white matter (WM), and neu- 
rofunction represent some of the most robust indices of individ- 
uals in the early stages psychosis (Fusar-Poli et al, 2007, 2011; 
Peters et al, 2010; Pettersson-Yeo et al., 2011). Furthermore, 
there has been increasing demand for such metrics to be used 
for direct clinical benefit (Matthews et al, 2006; Borgwardt and 
Fusar-Poli, 2012). In this context, the basis of the current inves- 
tigation was to use a combinative approach specifically focusing 
on these neuroimaging data types. From work conducted by 
our own group, measures of GM, WM and neurofunction were 
available for 19 FEP, 19 UHR, and 23 HC subjects. To ensure 
that subjects were matched for age and gender for the purposes 
of classification, this resulted in 19, 19, and 15 FEP and HC 
subject pairs, UHR and HC subject pairs, and FEP and UHR 
subject pairs, respectively (see Table 2 for a detailed characteri- 
zation of subject groups; for full details of how these data were 
acquired, we refer the reader to Pettersson-Yeo et al., 2013). In 
brief, these data were obtained as follows: (i) GM images with a 
1.5 mm 3 isotropic resolution and registered to MNI space were 
created using Tl -weighted structural scans preprocessed using 
the unified segmentation procedure in conjunction with a fast 
diffeomorphic image registration algorithm (DARTEL), and an 
additional modulation step conserving the total amount of GM 
in each voxel after registration (Ashburner and Friston, 2005; 
Ashburner, 2007); implemented in SPM8 (http://www.fil.ion.ucl. 
ac.uk) and running under Matlab 7.1 (Math Works, USA). As a 
final step, images were smoothed using a 6 mm full- width-half- 
maximum (FWHM) isotropic Gaussian kernel (Ashburner and 
Friston, 2009); (ii) for measures of WM, fractional anisotropy 
(FA) "skeletons" were used. These were generated from DTI data 
which was first preprocessed using ExploreDTI (Leemans et al., 
2009) software including the RESTORE algorithm (Chang et al., 
2005) to create FA maps corrected for eddy current distortion, 
head motion, b-matrix reorientation, and rejection of data out- 
liers. These maps were then entered into the software package 
Tract Based Spatial Statistics (TBSS) (Smith et al, 2006) to cre- 
ate FA "skeletons" depicting each subject's unique WM network 
and associated FA value defined integrity for each voxel; (iii) 
the fMRI contrast images used were generated from an fMRI 
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adapted Hayling sentence completion task. This involved sub- 
jects being shown sentence stems with the last word missing, for 
which they had to generate an overt response with a word that 
either made sense (i.e., Initiation), or no sense (i.e., Suppression), 
with the preceding stem. Blocks of five trials were interspersed 
with a rest condition following an ABABAB block design, in 
which subjects were presented with the word "REST" which 
they were instructed to repeat aloud, followed by a visual fix- 
ation cross. Functional images were preprocessed using SPM8 
software (http://www.fil.ion.ucl.ac.uk) running under Matlab 
7.1 (Math Works, USA). Following the standard SPM8 func- 
tional imaging pipeline for preprocessing and analysis, using 
the parameter estimates obtained from the task's six experi- 
mental conditions: (1) generation of an overt verbal initiation 
response (In); (2) generation of an overt verbal suppression 
response (Su); (3) repetition of "REST" during the initiation 
condition (RI); (4) repetition of "REST" during the suppres- 
sion condition (RS); (5) visual cross-fixation during the initiation 
condition (CFI); (6) visual cross-fixation during the suppres- 
sion condition (CFS), five contrasts of interest were computed, 
namely, Su > In, Su > RS, In > RI, Su > CFS, In > CFI. Of the 
five tested in the previous work, the two primary fMRI contrasts 
selected for inclusion here were chosen on the basis that the con- 
ditions being contrasted represent the most cognitively divergent 
of the five available, and were therefore most likely to result in the 
greatest activation differences. These were, (i) generation of an 
overt verbal initiation response > visual cross-fixation during the 
initiation condition (In > CFI), and (ii) generation of an overt 
verbal suppression response > visual cross-fixation during the 
suppression condition (Su > CFS) (see Pettersson-Yeo et al., 2013 
for more detail). For completeness, however, integrated classifica- 
tion was also performed by combining all seven kernels available 
from the previous study (i.e., CM, WM plus the five fMRI con- 
trasts), allowing us to investigate the impact which integrating 
greater kernel numbers has on classification accuracy. 

For each modality, CM, fMRI, and DTI, all voxels within each 
subject's image were used as features for SVM, with a whole brain 
mask used to remove any voxels outside of the brain area. 

SVM INTEGRATION: AN EMPIRICAL COMPARISON 

In order to measure the relative ability of each technique to 
increase classification accuracy based on the integration of data 
from different modalities overall, a non-parametric McNemar's 
test was performed comparing the integrated accuracies achieved 
by each method against every other method, collapsed across 
binary diagnostic comparisons for two-way, three-way, and all 
data kernels combined. The results of these tests are presented 
in Figures 3-5 alongside graphic visualizations showing the rel- 
ative difference between the classification accuracy achieved by 
each integrative method and the BSMCA, for each diagnostic con- 
trast, for combinations of two (Figure 3) and three (Figure 4) 
data types, in addition to all seven available kernels (Figure 5). 

McNemar's tests with Holm-Bonferroni correction (Holm, 
1979) were also performed comparing subject classifications of 
each individual integrated classifier vs. those of the correspond- 
ing BSMCA for each contrast, in order to identify if any individual 
observed difference was statistically significant. 



INTEGRATED CLASSIFICATION ACCURACIES 

Whilst the primary aim of the study was to investigate the relative 
ability of different integrative methods to enhance classification 
accuracy relative to the BSMCA, permutation tests were also 
conducted to examine the significance of the each integrated 
classification accuracy relative to chance accuracy. First, general- 
izability was tested using LOOCV. Subjects were then randomly 
assigned to a class and the LOOCV cycle repeated 1000 times. This 
provided a distribution of accuracies reflecting the null hypoth- 
esis that the integrated classifier did not exceed chance. The 
number of times where the permuted accuracy was greater than 
or equal to the true accuracy was then divided by 1000 to estimate 
a p-value. In order to correct for multiple comparisons, a Holm- 
Bonferroni step down procedure (Holm, 1979) was employed (see 
Figure 2). 

RESULTS 

In a substantial majority of comparisons across two, three, and 
seven kernel combinations, the BSMCA was higher than any of 
the classifier combination methods evaluated (see Figures 3-5 
and supplementary material). The minority of comparisons 
for which individual classifier combination methods produced 
higher accuracy than the BSMCA is reported below. 

UN-WEIGHTED SUM OF KERNELS 
Data integrated from two modalities 

Using SK, the ability of sMRI and DTI data combined to 
differentiate FEP from HC subjects was increased to 71.05% 
representing an approximate increase of 6% relative to the 
BSMCA. Furthermore, combining DTI and fMRI data using an 
SK approach, it was possible to discriminate FEP from UHR sub- 
jects with 83.33% accuracy representing an approximate increase 
of 10% relative to the BSMCA (Figures 2, 3). 

Data integrated from three modalities 

By combining three different data types using SK it was not pos- 
sible to increase classification accuracy relative to the BSCMA for 
any of the diagnostic comparisons (see Figure 4). 

Data integrated from seven kernels across three modalities 

Based on the integration of seven kernels encompassing three data 
types, SK was unable to increase classification accuracy for any of 
the diagnostic comparisons relative to the BSMCA (see Figure 5). 

MULTI KERNEL LEARNING 

Data integrated from two modalities 

Using MKL, the ability of sMRI and fMRI data combined to dif- 
ferentiate FEP from HC subjects, and UHR from HC subjects, 
was increased to 73.68% subjects representing for both compar- 
isons an approximate increase of 5% relative to the BSMCA (see 
Figures 2, 3). 

Data integrated from three modalities 

Based on the integration of three data types, MKL was unable to 
increase classification accuracy for any of the diagnostic compar- 
isons relative to the BSMCA (see Figure 4). 
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FIGURE 2 | Classifications accuracies achieved for discriminating FEP, 
UHR, and HC subjects, by integrating data in two-, three-, and 
seven-way combinations, using SK, MKL, AV, or MV. SK, Un weighted 
simple sum of kernels; MKL, Multi-Kernel Learning; AV, Prediction Averaging; 
MV, Majority Voting; GM, grey matter; FAS, fractional anisotropy skeleton; In, 
generation of an overt verbal initiation response; Su, generation of an overt 



verbal suppression response; Rl, repetition of "REST" during the initiation 
condition; RS, repetition of "REST" during the suppression condition; CFI, 
visual cross-fixation during the initiation condition; CFS, visual cross-fixation 
during the suppression condition. "Integrated classification accuracy 
significant at p < 0.05 FWE corrected. A Single modality classification 
accuracies of the base kernels being integrated. 



Data integrated from seven kernels across three modalities 

Based on the integration of seven kernels encompassing three 
data types, MKL was unable to increase classification accuracy 
for any of the diagnostic comparisons relative to the BSMCA (see 
Figure 5). 



PREDICTION AVERAGING 

Data integrated from two modalities 

Using AV, the ability of sMRI and DTI data combined to dif- 
ferentiate UHR from HC subjects was increased to 71.05% 
representing an approximate increase of 3% relative to the 
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FIGURE 3 | (A) Difference between the integrated accuracy achieved using 
SK, MKL, or AV, and the BSMCA, discriminating UHR and FEP subjects from 
HCs, and each other, using two-way combinations of sMRI, DTI, and fMRI 
data. (B) Results of McNemar's tests comparing subject classifications 
achieved by each integrative method collapsed across SVM contrasts and data 
combinations. SK, Un-weighted simple sum of kernels; MKL, Multi-Kernel 



Learning; AV, Prediction Averaging; GM, grey matter; FAS, fractional 
anisotropy skeleton; In, generation of an overt verbal initiation response; Su, 
generation of an overt verbal suppression response; Rl, repetition of "REST" 
during the initiation condition; RS, repetition of "REST" during the 
suppression condition; CFI, visual cross-fixation during the initiation condition; 
CFS, visual cross-fixation during the suppression condition. 
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FIGURE 4 1 (A) Difference between the integrated accuracy achieved 
using SK, MKL, AV, or MV, and the BSMCA, discriminating UHR and 
FEP subjects from HCs, and each other, using three-way combinations 
of sMRI, DTI, and fMRI data. (B) Results of McNemar's tests 
comparing subject classifications achieved by each integrative method 
collapsed across SVM contrasts and data combinations. SK, 
Un-weighted simple sum of kernels; MKL, Multi-Kernel Learning; AV, 



Prediction Averaging; MV, Majority Voting; GM, grey matter; FAS, 
fractional anisotropy skeleton; In, generation of an overt verbal 
initiation response; Su, generation of an overt verbal suppression 
response; Rl, repetition of "REST" during the initiation condition; RS, 
repetition of "REST" during the suppression condition; CFI, visual 
cross-fixation during the initiation condition; CFS, visual cross-fixation 
during the suppression condition. 



BSMCA (see Figure 3). Similarly, combining sMRI and fMRI 
(contrast In > CFI) data using AV enhanced classification of 
FEP from HC subjects to 71.05%, representing an approxi- 
mate increase of 3% relative to the BSMCA. When applied to 
the integration of DTI with fMRI data in order to discrim- 
inate FEP from UHR subjects, AV was able to increase clas- 
sification accuracy to 86.67 and 66.67%, using the In > CFI 
and Su > CFS contrasts, respectively, representing approximate 
increases of 13 and 10% relative to the BSMCA in each case (see 
Figures 2, 3). 



Data integrated from three modalities 

Combining sMRI, DTI, and fMRI (contrast In > CFI) data using 
AV, the ability to distinguish FEP from HC subjects, and FEP from 
UHR subjects, was increased to 71.05 and 83.33%, respectively. In 
each case this represented an approximate increase of 3 and 7% 
relative to the BSMCA (see Figures 2, 4). 

Data integrated from seven kernels across three modalities 

Based on the integration of seven kernels encompassing three 
data types, AV was unable to increase classification accuracy 
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FIGURE 5 | (A) Difference between the integrated accuracy achieved using 
SK, MKL, AV, or MV, and the BSMCA, discriminating UHR and FEP subjects 
from HCs, and each other, using seven-way combinations of sMRI, DTI, and 
fMRI data. (B) Results of McNemar's tests comparing subject classifications 
achieved by each integrative method collapsed across SVM contrasts and data 
combinations. SK, Un-weighted simple sum of kernels; MKL, Multi-Kernel 
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Learning; AV, Prediction Averaging; MV, Majority Voting; GM, grey matter; FAS, 
fractional anisotropy skeleton; In, generation of an overt verbal initiation 
response; Su, generation of an overt verbal suppression response; Rl, 
repetition of "REST" during the initiation condition; RS, repetition of "REST" 
during the suppression condition; CFI, visual cross-fixation during the initiation 
condition; CFS, visual cross-fixation during the suppression condition. 



for any of the diagnostic comparisons relative to the BSMCA 
(see Figure 5). 

MAJORITY VOTING 

Data integrated from three modalities 

Using MV, it was possible to discriminate FEP from HC sub- 
jects with 71.05% accuracy based on the three-way combination 
of sMRI, DTI, and fMRI (contrast In > CFI) data. This repre- 
sented an approximate increase of 3% relative to the BSMCA (see 
Figures 2, 4). 

Data integrated from seven kernels across three modalities 

Based on the integration of seven kernels encompassing three data 
types, MV was unable to enhance classification accuracy for any of 
the diagnostic comparisons relative to the BSMCA (see Figure 5). 

AN EMPIRICAL COMPARISON OF METHODS 

Data combined from two modalities 

Collapsed across all comparisons, the results of the McNemar's 
tests comparing subject classifications made by each method for 
all two-way combinations, though not statistically significant, 
gave the following best-to-worst ranking of methods based on 
their respective p-values: AV, MKL, SK (see Figure 3B) (as noted, 
MV was not performed for data combined from two modalities 
due to the potential over influence of the heuristic used where 
ties occur). However, in terms of the greatest individual accu- 
racy increases achieved, relative to the BSMCA, MKL was broadly 
outperformed by both AV and SK (see Figure 3). Across diag- 
nostic comparisons, greater integrated accuracies were achieved 
for the FEP vs. UHR comparison by each of the three integrative 
approaches, relative to the other two diagnostic comparisons. 

Data combined from three modalities 

Collapsed across all comparisons the results of the McNemar's 
tests comparing subject classifications made by each method for 
all three-way combinations, though not all significant, gave a best- 
to-worst ranking of methods based on their respective p-values: 



AV, MV, MKL, and SK. Consistent with this, in terms of the 
greatest individual increases in classification accuracy, relative to 
the BSMCA, AV performed better than MV, which in turn per- 
formed better than SK and MKL — neither of which were able to 
increase classification accuracy above the BSMCA. Across diag- 
nostic comparisons, consistent with the two-way combinations, 
the best integrated accuracies were generally achieved for the FEP 
vs. UHR comparison, followed by the FEP vs. HC, and then UHR 
vs. HC comparisons. 

Data combined from seven kernels encompassing three modalities 

Collapsed across all comparisons the results of the McNemar's 
tests comparing subject classifications made by each method, 
though not statistically significant, gave a best-to-worst rank- 
ing of methods based on their respective p-values: AV, MKL, 
MV, and SK. As shown in Figure 5, however, no integrative 
method was able to enhance classification accuracy relative to the 
BSMCA, with the best result represented by AV for the FEP vs. 
UHR comparison where it was only able to match the BSMCA. 
In terms of the smallest classification accuracy reduction rela- 
tive to the BSMCA, AV performed better than MV, MV better 
than MKL, and MKL better than SK. Across diagnostic com- 
parisons, consistent with the two-and three-way combinations, 
the best integrated accuracies (i.e., least decreasing relative to 
the BSMCA) were generally achieved for the FEP vs. UHR com- 
parison, followed by the FEP vs. HC, and then UHR vs. HC 
comparisons. 

INTEGRATED CLASSIFICATION ACCURACIES 

The results of the McNemar's tests, with Holm-Bonferroni 
correction, comparing classification accuracies relative to the 
corresponding BSMCA found none of the differences — either 
decreases or increases — to be statistically significant (p > 0.05). 

DISCUSSION 

In the current study we performed an empirical comparison of 
the relative abilities of four distinct methodological approaches to 
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increase the classification accuracy of SVM by combining multi- 
modal neuroimaging data. Specifically, each method was applied 
to three separate diagnostic comparisons related to the early 
stages of psychosis, utilizing combinations of data from sMRI, 
fMRI, and/or DTI modalities. The most striking feature of our 
results is that in most cases, the BSMCA provided higher clas- 
sification accuracy than any multi-modal combination method. 
With regard to the few contrasts that did improve, we note that, 
although none of the differences were statistically significant: (i) 
in agreement with our first hypothesis, an un-weighted simple 
sum of kernels appeared to perform slightly better than a rel- 
atively more sophisticated weighted multi kernel learning for 
integrating two modalities, (ii) inconsistent with our first hypoth- 
esis AV appeared to be slightly more effective than either SK 
or MKL for increasing classification accuracy across two and 
three modality combinations, (iii) in agreement with our second 
hypothesis, AV also appeared to perform slightly better than SK in 
terms of the number and magnitude of classification increases for 
integrating three modalities, (iv) contrary to our second hypoth- 
esis, MKL was unable to provide any increase in classification 
accuracy relative to the BSCMA, but MV was able to produce 
slight improvements in some cases, and (v) in agreement with our 
third hypothesis, we found that the performance of each approach 
was dependent on the diagnostic comparison to which they were 
applied. 

Taken together the results suggest that whilst the integration 
of different data types can enhance classification accuracy, the 
frequency of such instances may be limited. As a consequence, 
our results suggest that for small to moderately sized clinical neu- 
roimaging datasets, combining different imaging modalities in 
a data-driven manner is not a "magic bullet" to increase classi- 
fication accuracy. The findings also highlight that the potential 
of each integrative method to enhance classification accuracy 
appears to be: (i) differentially suited to different diagnostic com- 
parisons, (ii) influenced by the number of different data types 
being integrated, and (iii) influenced by the specific types of data 
being integrated. 

With respect to the influence of diagnostic comparison for 
example, as shown by Figures 3-5, there is a distinction between 
each of the four integrative methods to enhance classification 
accuracy dependent on the comparison to which it is applied. 
This may be related to the degree of complementary information 
provided by the data modalities to discriminate each diagnos- 
tic comparison (i.e., two different classifiers which make the 
same predictions for the same subjects have little complementary 
information to add to one another). 

With respect to the second factor, the assumption that greater 
kernel numbers will necessarily result in greater accuracy is not 
supported by the results here, and simply adding more data 
modalities may only contribute noise that impairs the ability 
of the SVM to discriminate classes, possibly by increasing the 
uncertainty with which parameters are estimated during train- 
ing. Consistent with this interpretation, the integrations with 
greatest increase were based on data combined from two, rather 
than three, modalities (see Figures 3, 4), and in the event all 
seven available kernels were used, no integrative method was able 
to increase classification accuracy. This may particularly be true 



if the data modalities being added do not enable classes to be 
discriminated in isolation. Note, however, that if kernels are pres- 
elected based on their ability to discriminate classes, this must not 
be done based on performance on the test data. 

The results also support the notion that for the sample size 
investigated when fewer data types are being integrated, less com- 
putationally complex techniques such as prediction averaging and 
a simple summing of kernels may provide comparable, if not 
greater, levels of integrated accuracy in comparison to more com- 
putationally complex approaches such as MKL. This is probably 
because MKL requires the estimation of more parameters than 
can practically be estimated from the small sample size we investi- 
gated. This is also consistent with Lewis et al. who reported similar 
findings based on their application of integrated SVM techniques 
to protein interaction prediction (Lewis et al., 2006). More recent 
studies also support this, suggesting that MKL may instead be bet- 
ter suited to larger sample sizes (Damoulas and Girolami, 2008). 
One proposed benefit of MKL that is still partially evident in the 
data here, however, is the explicit ability to down-regulate the 
weight of a "noisy" data set, whilst still utilizing its complemen- 
tary information. For example, MKL was able to combine the 
fMRI contrast, Su > CFS, which by itself had only been able to 
classify UHR from HC subjects with a statistically insignificant 
accuracy of 60.53%, with sMRI data and enhance overall clas- 
sification accuracy by approximately 5% relative to the BSMCA 
(see Figure 3). In comparison, using the same two-way data com- 
bination, the remaining integrative techniques were unable to 
enhance classification accuracy, possibly due to the un-weighted 
contribution of the fMRI contrast acting as "noisy" interference. 

In addition to the number of modalities it also seems evi- 
dent that the specific types of modality being integrated is a 
third important factor. For example, whilst the combination of 
DTI and fMRI data using SK and AV provided an approximate 
increase 10 and 13%, respectively with regard to the differen- 
tiation of FEP and UHR subjects, combining sMRI with DTI 
data, or the same fMRI contrast, using the same integrative meth- 
ods, did not result in a similarly increased classification accuracy 
(see Figure 3). As above, this is probably related to the degree of 
complementary information each data type offers another for dis- 
criminating each contrast. For example, alterations in CM, WM, 
and/or neurofunction, are unlikely to proceed equally across each 
stage of psychosis (i.e., GM alterations may occur sooner in the 
psychosis timecourse than changes in functional activation). As 
such, different data types may classify a given subject more, or 
less, easily depending on their specific psychotic stage. When con- 
sidering SVM as a research, and potential real-world clinical tool, 
therefore, it should be emphasized that the ability to classify dif- 
ferent clinical groups with the highest accuracy will be associated 
with specific data types that should be clarified. In this con- 
text, it is worth noting that should different data types not have 
any complementary information to add to one another, it would 
not be possible for an integrative approach to yield an accuracy 
greater than the relevant BSMCA. Hence, whilst the groups used 
here were smaller than those used in more traditional machine 
learning applications (e.g., Guodong et al., 2000), we suggest 
that the observed results are more likely to be a reflection of the 
intrinsic properties of the input data used rather than sample 
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size; the likelihood being that the different modalities combined 
simply had little, or no, complementary information to offer one 
another. Thus, it remains that an integrative approach may in fact 
have provided better results had more diverse types of data been 
combined (e.g., genetic, cognitive, neuroimaging). 

In summary, our results show that whilst integration has the 
potential to increase classification accuracy (by up to 13% in 
our data), such increases represent the exception rather than the 
norm. Rather, these data suggest that in the majority of cases 
single modality classification may provide the highest accuracy, 
with integration predominantly resulting in a decrease relative to 
the BSMCA. In addition, these findings emphasize the substantial 
impact of a range of factors, on the ability of integration per se to 
increase classification accuracy, including: the method used, the 
diagnostic groups being classified, and, the number and types of 
data being combined. 

LIMITATIONS 

The study's main limitation may be considered to be the relatively 
small size of the clinical and control groups used for classification, 
thereby limiting the generalizability of any generated classifier(s) 
to future psychosis subjects. Consistent with Nieuwenhuis et al. 
(2012), it is possible that consistently higher integrated accura- 
cies may have been evident had a larger sample size been used; 
potentially by counteracting increased noise generated as a result 
of combining multiple modalities. Nevertheless, it remains that 
the sample used here is comparable with the majority of stud- 
ies that, in recent years, have used SVM to discriminate between 
patients and controls or between different clinical groups (Orru 
et al, 2012). A second limitation is the potential impact on the 
findings due to having more features (e.g., voxels) than samples 
(e.g., subjects), commonly referred to as the "curse of dimension- 
ality." Whilst offset here due the fact classification was performed 
using a linear kernel formulation — thus limiting the number of 
SVM parameters to be optimized to the number of samples, plus 
one — it remains that by using so many features the classifier is 
made more vulnerable to noise, a problem potentially exacerbated 
by combining multiple modalities. 

CONCLUSION 

In the current study we performed an empirical comparison of 
four distinct approaches for increasing SVM classification accu- 
racy by integrating data from multiple sources on a dataset of 
a similar size to many clinical neuroimaging studies. Following 
individual application to three separate diagnostic comparisons 
related to the early stages of psychosis, we demonstrated that the 
specific integrative approach used, the number of data types inte- 
grated, and also the diagnostic comparison to which they are 
applied, all appear to have substantial impact on the integrated 
accuracy achieved. Most importantly, it appears that using a sin- 
gle modality, single kernel classifier often provides the best results, 
suggesting that combining different imaging modalities in a data- 
driven manner is not a "magic bullet" to increase classification 
accuracy for moderately sized clinical data sets. It remains pos- 
sible however that this conclusion is dependent on the use of 
neuroimaging modalities that had little, or no, complementary 
information to offer one another, and that the integration of more 



diverse types of data would have produced greater classification 
enhancement. We suggest that future studies ideally examine a 
greater variety of data types (e.g., genetic, cognitive, and neu- 
roimaging) in order to identify the data types and combinations 
optimally suited to the classification of early stage psychosis. 
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